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Abstract 

Community identification algorithms have been used to enhance the quality of the services perceived 
by its users. Although algorithms for community have a widespread use in the Web, their application to 
portals or specific subsets of the Web has not been much studied. In this paper, we propose a technique 
for local community identification that takes into account user access behavior derived from access logs 
of servers in the Web. The technique takes a departure from the existing community algorithms since it 
changes the focus of interest, moving from authors to users. Our approach does not use relations imposed 
by authors (e.g. hyperlinks in the case of Web pages). It uses information derived from user accesses 
to a service in order to infer relationships. The communities identified are of great interest to content 
providers since they can be used to improve quality of their services. We also propose an evaluation 
methodology for analyzing the results obtained by the algorithm. We present two case studies based on 
actual data from two services: an online bookstore and an online radio. The case of the online radio 
is particularly relevant, because it emphasizes the contribution of the proposed algorithm to find out 
communities in an environment (i.e., streaming media service) without links, that represent the relations 
imposed by authors (e.g. hyperlinks in the case of Web pages). 

1 Introduction 

Community identification algorithms have been extensively used as a way to improve the quality perceived 
by users navigating through the Web. Search engines have incorporated this kind of technology as a source 
of information for their ranking algorithms and new applications, such as automatic directory creation. 
Furthermore, community identification studies have proven to be of great value to researchers trying to 
increase their understanding of the information society. [|l], ^ |lO[ |9|| . 

The use of community identification algorithms to local communities, such as those that interact with 
portals or use specific services in the Web, has not been much studied. The direct application of existing 
algorithms to local community identification does not yield relevant results. The main reason is the dif- 
ference between the processes associated with service creation in the two levels: local and global. The 
creation of services in the Web as a whole, global context, is governed by distributed and uncoordinated 
processes. For instance, someone's decision to reference one page authored by someone else does not have 
to go through any regulatory agency and does not need its peer's authorization. Therefore, the majority 
of links in the Web can be considered to have a semantic of reputation associated with it. [^, ^ |l^ |2C| ]. 
Differently, portals are created in a centralized and coordinated manner The structures are created for 
navigational and business purposes leading to a completely different structure [^. This is why current 
community identification algorithms do not provide good results in that type of environment. 
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The availabiUty of user access information in the case of a local context is another important fact that 
should be noted. The combination of the community identification algorithms with user access information 
would be very valuable to content providers, that can provide specific services to specific communities l ]!^ ] . 

The inclusion of user access patterns on the community discovery process also allows us to infer com- 
munities even from a source that does not have explicit relationship information. Neither the books of an 
online bookstore nor the games provided by an ISP are explicitly related and, therefore, can take advantage 
of such technique. As the Web evolves, new kinds of services, not explicitly related, are created and made 
available to the users accentuating the need for algorithms designed to work based on evidences other than 
link information. Examples include streaming media and game services. 

This work proposes and evaluates a technique for local community identification based on user ac- 
cess patterns. Our approach starts from a well-known community identification algorithm, the Hyperlink- 
Induced Topic Search (HITS). We then propose a way of transforming user access information into a 
graph-based structure to be used jointly with the HITS algorithm. A methodology to evaluate communities 
that takes into account the semantic meaning associated with each community is also supplied. In order 
to exemplify the benefits of our approach, we show two case studies based on services available on the 
Internet. 

The paper is organized as follows: in Section^ we present the related work. Section ^presents the local 
community identification algorithm and proposes a methodology to evaluate the results. Section ^presents 
two case studies, based on actual logs from real online services. Section^ discusses the concluding remarks 
and future work. 



2 Related Work 

A considerable amount of research has been developed on community identification over the Web. Most 
of the approaches focus on analyzing text content, considering vector-space models for the objects usually 
related to Information Retrieval [||], hyperlink structure connecting the pages [ p^ ^ |[|, markup tags associ- 
ated with the hyperlinks or the combination of the previously cited sources of information |^] . Therefore, 
they are restricted to objects that contain implicit information provided by the authors. Our work, on the 
other hand, is based solely on user access behavior. 

Besides, we are considering community identification applied to a local context instead of the whole 
Web. Our approach aims to adapt the graph based community identification algorithm described in [IC]. 
Some modifications to JTo| ] that takes into account user information have already been proposed in [15]. 
However, this work was not focused on the community identification capabilities of [|lo[| and also consid- 
ered a different representation of user patterns. 

Other relevant aspect of our work is the proposal of a community evaluation methodology that can 
be applied to other techniques already proposed such as |^ |7]| for comparison purposes. Most of the 
comparison methodologies proposed so far are based on disjunction and coverage of the communities not 
taking into account semantic meaning. 



3 Local Community Identification 
3.1 The HITS Algorithm 

The HITS algorithm was initially proposed as a method to improve the quality of searches on the Web JT^. 
It takes answers to a query from a text-based search engine and changes the ranking of these Web pages 
considering the underlying hyperlinked structure connecting them. This approach, formerly known as link 
analysis, was also the base for several other related studies ^ ^. The links are considered as a way 
to represent correlations between pages, inducing a certain reputation/quality to a Web page pointed to by 
another 

The algorithm identifies pages that provide valuable information for a determined query and also, pages 
that are sources of good links for the query. These two kinds of pages are respectively called authorities 
and hubs. The query in the search application is used to limit the scope of the Web considered by the 



2 



algorithm at each execution. Therefore, it Hmits its coverage to a certain subject expressed by a user in 
terms of his/her query. 

The idea behind HITS is to identify hubs and authorities through a mutually reinforcing relationship 
existent between the pages. This relationship may be expressed as follows: a good hub is a page that points 
to good authorities and a good authority is a page that is pointed to by good hubs. This approach is very 
successful for the search application since it lacks some of the weakness presented by other simple link 



analysis strategies like indegre and outdegre ranking [ 12 1 



An iterative algorithm may be used to break the circularity of the mutually reinforcing relationship 
and to compute authority and hub weights for each page. Thus, each page p, has associated with it an 
authority weight a^, and a hub weight hp. These weights form a ranking of the pages ranging from good 
hubs/authorities, with high h^lap values, to bad ones, with low hp/Up. The weights are iteratively evaluated 
by the following procedure: 



hp = ^aq 



A.p 



where p ^ q indicates the existence of a link link from p to q. 

Let A denote the adjacency matrix of the Web page's subgraph to be considered by the HITS algorithm, 
i.e., A[p, q] is equal to one if there is a link from p to q and otherwise. The process of computing the 
weights may be rewritten to: 

a = J^h = A^Aa 
Aa^ AA^h 



where a and h are arrays storing authority and hub information for all the pages considered. Then, it can 
be shown that, the authority and hub arrays, a and h, converge to the principal eigenvector of A^ A and 
AA^ respectively. 

Although the initial work of HITS only considered the principal eigenvector of A and AA^, an 
extension to it [Q, proposed to use the same approach to identify communities of pages over the whole 
Web. The approach of the authors is to use the non-principal eigenvectors of the matrices A^A and AA'^ in 
order to identify other communities of Web pages. Thus, by computing the non-principal eigenvectors we 
can identify other a's and h's arrays identifying other communities. An implicit ranking of the communities 
can be derived by this method: the principal eigenvectors identifies the most important community over 
the pages, the second principal eigenvectors explicit the second most important community over the pages, 
etc. 

Our approach applies the methodology to find communities on a graph, introduced by Kleinberg, to 
another context. Our goal is to identify communities of users that share a common interest, while accessing 
a service. User access patterns are used in order to infer relationships between them. The generation of 
the graph representing the relationships and its application to the HITS algorithm is described in the next 
Section. 



3.2 Community Identification Process 

Usually, the information used by the community identification algorithms is provided by the authors of 
the services. Thus the communities identified reflect the authors' perception of the world. For instance, 
when these methods are applied to the Web |^, they use information explicited by the links connecting 
the pages as a way to infer relationship between them. This kind of information such as links or textual 
information is provided on the creation of the Web page and is influenced by the author's unique view of 
the object. 

In the case of local community identification, i.e., community identification restricted to a specific 
service, we consider the users's viewpoint. The author's point of view can be taken from the centralized 
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process that creates the service, which is directly reflected in the service organization (e.g., the navigational 
structure of the service). 

Although user access patterns are of great interest to local community identification, it is not straight- 
forward how it should be treated. 

At first sight, we would consider the objects as being the source of a unique view about a certain 
subject. This procedure is successful while considering the whole Web since each page represents a view 
about a subject provided by its author. Using objects as nodes at the graph and links derived from user 
access data would not represent such a unique interpretation of the data since different users have different 
interests when accessing an object. Therefore, we propose to use the accesses to a service in order to create 
a graph that maps the relationship between its users and not between its objects. This is a departure from 
the traditional approach taken by community identification algorithms. 

Most services in the Web log files that record user requests to their objects. These logs have information 
about the objects requested by each user and some additional information such as the time it was issued or 
the status returned. Through the analysis of these logs, we can group requests into user sessions that are 
limited by a period of inactivity of the users [ [l4[ In this work, the sessions are considered to be the 
basic unit expressing a user interest, although other basic units such as the user itself or a single request 
could be used with a few adaptations. Each session presented in the log is considered to be a node of a 
graph that represent user access patterns. The connection between any two nodes p an g is directed and the 
weight, S[p, q], between related nodes is computed by: 



where 0„ represents the set of objects accessed in the nth session. 

After constructing matrix S, that expresses the relationship between the user sessions, we identify the 
communities by applying the HITS algorithm exchanging A to S. The authority weight of a session s in 
a community c, given by Uc.s, is used in order to characterize the communities. The intuition behind this 
procedure is that the authority weight of a session is related to the authority weight of the objects requested 
in it. The subject treated by each community is implicitly defined by its members, i.e., sessions with high 
authority weights. 

3.3 Community Evaluation and Comparison 

After user communities have been identified, a way to express their interrelationship must be provided. 
This comparison, in terms of their similarities/dissimilarities, is of great value to service providers since it 
is this sort of information that would help them to design better services. For instance, one might decide to 
provide a personalized service to its users based on information stored in the communities. 

The weights Qc.s, associated with each pair session/community, generate a rank of the sessions within 
the communities. Based on the rankings, our analysis tries to identify good and bad sessions for each 
community. These rankings pass through a series of data analysis techniques in order to provide interrela- 
tionships and interpretations for the communities. 

We use the Spearman rank correlation coefficient [ jl3| ] to compare two communities. This correlation 
coefficient is a non-parametric (distribution-free) rank statistic proposed by Spearman as a measure of the 
strength of the correlation between two variables through the analysis of the rankings imposed by them. 
The Spearman method can be used to calculate the correlation between any communities a and b by: 



where r is the difference in rank position of corresponding sessions. The Sq h value, can be considered 
to be an approximation to the exact correlation coefficient that could be found if the authority weights for 
each session were considered. 

The Spearman rank correlation varies from -1 to 1. Completely opposite rankings are indicated by 
-1 while equal rankings are represented by 1. We define distance (i.e., c?a,b) as the separation of two 
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communities and calculate it by the following: 



da.b — 1 — Sa.b 



The above definition is useful for visualization purposes and for analyzing the communities, as shown in 



Another important artifact related to community evaluation is the ability to discover the subjects rep- 
resented by each of them. A simple, yet robust, method is to take into consideration the objects accessed 
by the users in each session. We split the sessions into three disjoint sets with respect to each community, 
the set of members, the set of non-members and the rest of them. The set of members is constituted by the 
top n sessions of the community ranking. The non-members set is formed by the sessions occupying the 
lowest n positions of the ranking. The remaining sessions are included in a third set not considered through 
the rest of the evaluation process. The value of n should be chosen based on the level of specificity desired 
or on the information available about the objects. 

After classifying sessions as members, non-member and indifferent, we proceed by evaluating pos- 
itively the objects accessed by the sessions belonging to the members' set and negatively the objects 
accessed by the sessions belonging to the non-members' set. The weight associated with each pair ob- 
ject/session is calculated by a measure based on a tf-idf [|^ approach, usually employed by information 
retrieval techniques. The frequency (tf) of an object within a session represents its importance in the scope 
of the session, while the distinction capability (idf) provided by the object is computed by: 



where N represents the total number of sessions and No represents the number of session in which the 
object o was accessed. 



This section presents the results obtained by the application of the proposed techniques to two different 
applications: an online bookstore and an audio streaming media server providing content for an online 
radio. The focus of interest are the books for the online bookstore and songs for the audio streaming media 
server. The main reason for choosing these applications was the lack of any explicit relationship between 
objects provided by the service authors. The data comprises one week of accesses to each service. The 
dataset from the bookstore was collected from August 1st to August 7th of 1999, while the audio streaming 
media dataset was collected from January 13th to January 19th of 2002. 

The online bookstore considered here is a service specialized on Computer Science literature and oper- 
ates exclusively on the Internet. Throughout the period, the bookstore received 1 .7 million requests, 50,000 
of which were requests for information about books, such as: authors, price, category, and reviews. Only 
those types of requests were considered in this experiment. We used 30 minutes as a threshold for the 
period of inactivity. As a result, we found 40,000 users sessions. 

The online radio service provides a Web interface to an audio streaming media server that provides 
songs. Users can create personal radios, by specifying the songs they want to listen to, or choose a previous 
stored radio. In the process of radio creation, users can listen excerpts of the songs before they are inserted 
in the radio's playlist. The streaming media server received 2.3 million requests, 662,000 of them were 
requests for full songs. Only those requests were considered in this experiment because we only wanted to 
capture the behavior of users who were already listening established radios. Again we used 30 minutes of 
inactivity period as a threshold. The number of sessions found was 78,000. 

4.1 Online Bookstore 

The local community identification technique was applied to find the top 10 communities in the online 
bookstore dataset. The communities were named from CI to CIO. The qualitative analysis of subjects 
covered by each community is presented in Table [l|. For this analysis, the first half of the sessions' ranking 
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Communities 


Cl 


C2 


C3 


C4 


C5 


Best-ranked categories 


Certification 


Networking 


Programming 


Programming 


Databases 


Databases 


Programming 


Networking 


Operating Systems 


Hardware 


Reference - Education 


Web Development 


Operating Systems 


Hardware 


Digital Business & Culture 


Worst-ranked cateiiories 


Web Developiiieiil 


Rel'erence - Education 


Microsol'l 


Dauibases 


Rei'ei'ence - ilducalioii 


Networking 


Certification central 


Databases 


Microsoft 


Programming 


Programming 


Databases 


Certification Central 


Certification Central 


Certification Central 



Communities 


C6 


C7 


C8 




C9 


CIO 


Best-ranked categories 


Certification 


Programming 


Programming 


Databases 


Certification Central 


Microsoft 


Operating Systems 


Operating Systems 


Web Development 


Databases 


Networking 


Web Development 


Web Development 


Operating Systems 


Reference - Education 


Worst-ranked categories 


Home & Office 


Networking 


Microsoft 


Programming 


Web Development 


Programming 


Microsoft 


Databases 


Microsoft 


Networking 


Databases 


Certification Central 


Certification Central 


Networking 


Programming 



Table 1: Qualitative analysis for the online bookstore dataset 
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Figure 1: Results for the onhne bookstore dataset 
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were considered to be the community members and the second half the non-members. Information about 
the categories that each book belongs to were collected from the Amazon [] online store. The weight of each 



book, computed as explained in Section 3.3, were used to find the most and the least important categories 
for each community. Analyzing Table Q we can infer the following interpretation for each community: 

• Community CI is basically formed by users interested in database certification that show less interest 
in network and programming questions related to Web development. 

• Community C2 have users that show some interest in Web development, while their main interests 
are networked applications involved with it, less importance is given to database and certification 
program by the users of this community. 

• Distributed computing is the main interest treated by community C3, this community is less related 
to databases and Microsoft platforms/applications. 

• Community C4 aggregates users interested in low level programming basically related to operating 
system's issues. It is interesting to note that this community is also less related to databases and 
Microsoft, similar to community C3, because these platforms are, generally, less flexible. 

• The main interests of users pertaining to community C5 are hardware specification of database sys- 
tems. The close relationship between this community and management of such systems, induced by 
the interest in digital business books, also worths mention. 

• Community C6 is formed by users interested in network administration of Microsoft systems. The 
community is mainly related to certification programs about this subject. 

• Low-level programming and scripting for Web development are the main interests of users belonging 
to community C7. They are are not interested, although, in the network problems related to Web 
development. 

• Community C8 is also related to low-level Web development issues. The main difference between 
C8 and C7 is the fact that in C8 the database category is underprivileged in favor of the ones bottom- 
ranked in C7. 

• Community C9 is also related to Web development such as C7 and C8, although the users of this 
community express some interest in certification programs. This information was gathered by the 
analysis of the whole set of categories for this community. 

• Certification in database systems is the main concern of users belonging to community CIO. 

We use two data analysis techniques, Sammon's mapping and Hierarchical Clustering, to increase 
our understanding of the communities. The Sammon's mapping pi] ] is a nonlinear projection method 
closely related to metric Multi-Dimensional Scaling (MDS). This method tries to optimize a cost function 
that describes how well pairwise distances in a data set are preserved on the generated projection. The 
projection derived by the use of Sammon's mapping can be seen in Figure |l[(a). Figure |l[(b) shows the 
Hierarchical Clustering obtained by the use of the the pairwise distance matrix between communities and 
the complete-linkage method |]l6|]. The complete-linkage method works as follows: 

1. Assign to each community its own cluster and consider the distance between clusters to be the same 
one between the respective communities. 

2. Find the closest clusters and merge them into a single one. 

3. Compute the distances between this new cluster and each of the others. The distance between two 
clusters is calculated as being the longest distance connecting any sessions belonging to each cluster. 

4. Repeat steps 2 and 3 until all session are grouped into a single cluster containing all the sessions. 
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Figure 2: Results for the audio streaming media dataset 

Figure |l}(c) shows the distances between the clusters merged in each step of the algorithm. 

As expected, both methods give similar results. They group together communities that are closely 
related like CI and CIO, and place apart communities that have no relation like C6 and C5. It is even 
more interesting to notice that communities like C8 and C9, that at a first look seem similar, are correctly 
separated by both methods. 

The top clusters of the hierarchy shown in Figure |l[(b), separate the dataset in two very distinct groups. 
The first one, formed by (C2, C3, C4, C5,C7) represents a group where most of the users are interested in 
low-level questions, like programming and networking, usually related to operating systems. The other one, 
formed by (CI, C6, C9, CIO), represents a group of users mostly interested in certification programs and 
their interests vary from network administration to Web development. The dispersion of interests found 
on the latter group was automatically identified. This can be derived from the highest cluster distance 
considered for this merge. Figure |l[(c), and also by the dispersion of the points on the Sammon's mapping 
projection. Figure |l].(a). 

The analisys of community clusters can be extended to the whole hierarchy with similar results. The 
Sammon's distribution provides a comprehensive visualization of the relations expressed by the hierarchy 
and the use of both techniques together is a great start point for analysis of community data. The quality 
of the results obtained in the analysis is an evidence of the applicability of the distance metric based on 
Spearman correlation. 



4.2 Audio Streaming Media Server 

The same methodology used in the previous section was applied to the audio streaming media dataset. 
The top 10 communities (CI to CIO) existent on the dataset were identified. The qualitative analysis of 
styles covered by each community and a short explanation of some Brazilian music styles are presented 
in Table ^ For this analysis, only the top and bottom 100 session were considered to be the elements 
of the members' and non-members' sets. Unlike the online bookstore, we did not have access to a unique 
identifier for the songs played. The information available was the title of the songs, CDs and artists, making 
the process of categorizing data a time-consuming task. Data about the styles of the songs accessed on the 
considered sessions were collected from Amazon and Submarine^ a major online store in Brazil. The 
Sammon's mapping for this dataset is presented in Figure |[(a). Figure ^(b) and Figure §.(c) the results 
obtained by the Hierarchical Clustering method when applied to this dataset. 

Like in Section 4.1, we have the following obserrvations for the identified communities. For example, 
users in communities C9 and CIO represent users interested in international music styles, that do not pay 
much attention to Brazilian music. Communities C3 and CIO, that are located apart in both representations, 
seems to represent different interests of their users. Although the same kind of analisys based on similarities 
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Communities 


Cl 


C2 


C3 




C4 


C5 


Best-ranked categories 


Brazilian Pop & Rock 


Samba, Axe,^Pagode'^ 


Soundtracks 


FOITO'^ 


FOITO 


Samba, Axe, Pagode 


Seitanejo"^ 


International Pop 


MPB^ 


Sertanejo 


Soundtracks 


MPB 


International Rock 


International Rock 


Brazilian Pop & Rock 


Worst-ranked categories 


MPB 


World Music 


Brazilian Pop & Rock 


Teen Pop 


World Music 


World Music 


International R&B, Soul 


Samba, Axe, Pagode 


Samba, Axe, Pagode 


Blues 


International Rock 


International Rock 


Sertanejo 


Seitanejo 


MPB 



Communities 


C6 


C7 


C8 


C9 


CIO 


Best-ranked categories 


FoiTO 


Brazilian Pop & Rock 


International Pop 


International Rock 


International Pop 


Sertanejo 


International Rock 


Soundtracks 


Reggae 


International R&B, Soul 


Samba, Axe, Pagode 


Samba, Axe, Pagode 


International Rock 


Orchestras and Easy Listening 


Soundtracks 


Worst-ranked categories 


Brazilian Pop & Rock 


International R&B. Soul 


MPB 


MPB 


Sertanejo 


World Music 


World Music 


Fono 


Samba, Axe, Pagode 


Classics 


MPB 


MPB 


Brazilian Pop & Rock 


Seitanejo 


MPB 



^Popular afro-Brazilian style from the Bahia state, an style closed related to carnival 
■^Popular Brazilian style derived from Samba 
^Catchy dance music from the Northeast of Brazil 
'^Brazilian country music 

^Commonly used for Brazilian pop coming after the Bossa Nova style 



Table 2: Qualitative analisys for the audio streaming media dataset 

of communities and their interests can be done for this dataset, we want to point out other dataset features 
identified by the algorithm without relying upon any explicit information. One of them is related to the 
structure of the phonographic industry existing in Brazil and the other one is related to the specificity of 
each dataset. 

Even for an untrained observer. Table || shows that users of the online radio exhibit strong interest in 
local music. Much of the categories cited are of Brazilian music, even though all top international albums 
were also available. This fact is extremely important since it reflects what happens everyday on Brazilian 
streets. The IFPI Music Piracy Report shows that over 50% of the piracy in Brazil is domestic and, 
therefore, many questions concerning the survival of the local phonographic industry production are being 
raised. The algorithm's capability of confirming a behavior observed in the the society is very interesting 
since it can shed light on new questions. 

The specificity level of each dataset is different and the algorithm is able to reflect this fact. The online 
bookstore is specialized in Computer Science while the online radio service provides access to different 
music styles from different nationalities. The slightly higher distance measures used in the each merge step 
is an evidence of the latter. Figures |l].(c) and ^.(c). Also we can see in the Sammon's mapping that the 
communities found in the audio streaming media dataset. Figure ^ (a), are more spread than the ones of the 
bookstore dataset. Figure ^(a). 

5 Concluding Remarks 

The methodology proposed offers several advantages over the graph-based algorithms in their pure form 
when applied to the context of local community identification. The communities identified represent the 
user's perception of the information provided by the services, and this understanding gives service providers 
a great opportunity to service improvement. 

An evaluation methodology based on data analysis available was also proposed. The evaluation tech- 
nique is based on tf-idf ranking of occurrences and the Spearman rank correlation. The former is used to 
provide the focus of each community and the latter, derive a pairwise distance metric. The benefits of these 
methods are exemplified by the case studies, based on actual data of two real services available in the Web. 
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The results obtained in this paper are encouraging and show that the proposed techniques and metrics 
are promising for characterizing the interests of users accessing a service in the Web. Yet, this is just an 
introductory study and we must devote much attention to other possibles metrics, datasets and applicabili- 
ties of the proposed technique. The temporal emergence of communities and their evolution is also of great 
interest. We also intend to compare our results with other methods used for similar purposes. 
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